Markov decision processes (MDPs) and simple stochastic games (SSGs) provide arich mathematical framework to study many important problems related toprobabilistic systems. MDPs and SSGs with finite-horizon objectives, where thegoal is to maximize the probability to reach a target state in a given finitetime, is a classical and well-studied problem. In this work we consider thestrategy complexity of finite-horizon MDPs and SSGs. We show that for all$\epsilon>0$, the natural class of counter-based strategies require at most$\log \log (\frac{1}{\epsilon}) + n+1$ memory states, and memory of size$\Omega(\log \log (\frac{1}{\epsilon}) + n)$ is required. Thus our bounds areasymptotically optimal. We then study the periodic property of optimalstrategies, and show a sub-exponential lower bound on the period for optimalstrategies.
展开▼